Goto

Collaborating Authors

 Consumer Products & Services


OptiTree: Hierarchical Thoughts Generation with Tree Search for LLMOptimization Modeling

Neural Information Processing Systems

Optimization modeling is one of the most crucial but technical parts of operations research (OR). To automate the modeling process, existing works have leveraged large language models (LLMs), prompting them to break down tasks into steps for generating variables, constraints, and objectives. However, due to the highly complex mathematical structures inherent in OR problems, standard fixed-step decomposition often fails to achieve high performance. To address this challenge, we introduce OptiTree, a novel tree search approach designed to enhance modeling capabilities for complex problems through adaptive problem decomposition into simpler subproblems. Specifically, we develop a modeling tree that organizes a wide range of OR problems based on their hierarchical problem taxonomy and complexity, with each node representing a problem category and containing relevant high-level modeling thoughts. Given a problem to model, we recurrently search the tree to identify a series of simpler subproblems and synthesize the global modeling thoughts by adaptively integrating the hierarchical thoughts. Experiments show that OptiTree significantly improves the modeling accuracy compared to the state-of-theart, achieving over 10% improvements on the challenging benchmarks.


Self-Challenging Language Model Agents

Neural Information Processing Systems

Large language models are quickly becoming the foundation for intelligent agents that are capable of using tools. However, training such agents is challenging because it requires human creation and annotation of a diverse set of tasks, tools, and evaluation criteria. In this paper, we propose the Self-Challenging framework for training an agent on high-quality tasks that are generated by itself. The agent first plays the role of challenger and generates a task after interacting with the given tools. The tasks take the form of a novel general class of problems termed Code-as-Task, which are defined by an instruction, a verification function and solution and failure cases which serve as tests, allowing to filter only for highquality tasks. The agent then takes an executor role and trains on those tasks with reinforcement learning using the evaluation feedback as a reward. Evaluation on two existing multi-turn tool-use agent benchmarks, M3ToolEval and TauBench, shows the Self-Challenging framework achieves over a two-fold improvement in Llama-3.1-8B-Instruct,


TransferTraj: AVehicle Trajectory Learning Model for Region and Task Transferability

Neural Information Processing Systems

Vehicle GPS trajectories provide valuable movement information that supports various downstream tasks and applications. A desirable trajectory learning model should be able to transfer across regions and tasks without retraining, avoiding the need to maintain multiple specialized models and subpar performance with limited training data. However, each region has its unique spatial features and contexts, which are reflected in vehicle movement patterns and are difficult to generalize. Additionally, transferring across different tasks faces technical challenges due to the varying input-output structures required for each task. Existing efforts towards transferability primarily involve learning embedding vectors for trajectories, which perform poorly in region transfer and require retraining of prediction modules for task transfer. To address these challenges, we propose TransferTraj, a vehicle GPS trajectory learning model that excels in both region and task transferability.


Scientists Invent a Way to Brew Espresso With Ultrasonic Waves--No Hot Water Required

WIRED

Researchers have demonstrated they can make coffee comparable to conventional espresso using ultrasonic waves. Because the process doesn't need hot water, it consumes 75 percent less energy. What do you need to make a good espresso? Ground coffee, of course; a machine capable of generating pressure; and hot water, preferably heated to between 195 and 205 degrees Fahrenheit. But could one perhaps do without that last element?


Tourist dies in Dominican Republic luxury resort fire

BBC News

A huge fire at a luxury beach resort in the Dominican Republic killed one woman and forced nearly 1,700 guests to be evacuated on Friday. In a statement to local media, the DAEH emergency services said that a 46-year-old Italian tourist died, three people were taken to medical facilities and six others were treated on site. Drone footage shows how widespread the fire was, with buildings spanning the Viva Wyndham Dominicus Beach in Bayahibe on fire and thick black smoke billowing into the air. What sparked the early-morning blaze is not yet known, but an initial investigation found the flames spread quickly due to wind conditions and the flammable nature of the thatched roofs on some buildings. The country's Emergency Operations Center (COE) said the fire had been brought under control and guests had been moved to other hotels. It added that tourist activities in the town and surrounding area have been unaffected and can continue as normal.



In 1962 Wisconsin, delivery pizzas were cooked in traffic

Popular Science

Mobile kitchens ensured that pizzas arrived piping hot. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. In 1962, Pizza on Wheels aimed to deliver restaurant-fresh pizza straight from the oven. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .


Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLMPlanning with Multifaceted Constraints

Neural Information Processing Systems

Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences.


URB - Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles

Neural Information Processing Systems

Connected Autonomous Vehicles (CAVs) promise to reduce congestion in future urban networks, potentially by optimizing their routing decisions. Unlike for human drivers, these decisions can be made with collective, data-driven policies, developed using machine learning algorithms. Reinforcement learning (RL) can facilitate the development of such collective routing strategies, yet standardized and realistic benchmarks are missing.


Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework

Neural Information Processing Systems

Automated interpretability research aims to identify concepts encoded in neural network features to enhance human understanding of model behavior. Within the context of large language models (LLMs) for natural language processing (NLP), current automated neuron-level feature description methods face two key challenges: limited robustness and the assumption that each neuron encodes a single concept (monosemanticity), despite increasing evidence of polysemanticity. This assumption restricts the expressiveness of feature descriptions and limits their ability to capture the full range of behaviors encoded in model internals. To address this, we introduce Polysemantic FeatuRe Identification and Scoring Method (PRISM), a novel framework specifically designed to capture the complexity of features in LLMs. Unlike approaches that assign a single description per neuron, common in many automated interpretability methods in NLP, PRISM produces more nuanced descriptions that account for both monosemantic and polysemantic behavior. We apply PRISM to LLMs and, through extensive benchmarking against existing methods, demonstrate that our approach produces more accurate and faithful feature descriptions, improving both overall description quality (via a description score) and the ability to capture distinct concepts when polysemanticity is present (via a polysemanticity score).